A first RDF implementation of the COSMIC database on mutations in cancer

نویسندگان

  • Achille Zappa
  • Paolo Romano
چکیده

Motivation and Objectives Within a living organism, genome and proteome variations may influence many molecular interactions and biochemical pathways, leading to deleterious effects in the proper activity of cells, tissues, and organs; ultimately, this may be the cause of many syndromes and diseases. It is now well known that tumors may arise as a result of a series of DNA sequence abnormalities and mutations. It is then not surprising that there is a vast amount of information available in the scientific literature and that a lot of information systems devoted to the management of related data exist. Among these, of particular interest are the many Locus Specific Data Bases (LSDB) and the COSMIC (Catalogue of Somatic Mutations in Cancer) database (Forbes et al., 2011). Such data, however, are not yet sufficiently integrated with other molecular, biomedical, and clinical databases. New efforts are therefore needed in this direction. Data retrieval, search and integration solutions in bioinformatics are increasingly making use of a set of standards and technologies which are the basis of the Semantic Web (Berners-Lee et al., 2001) framework. This framework is intended to evolve the web into a distributed knowledge-base and a first step in this evolution is the generation of a Web of Data (Bizer et al., 2009). In this view, we can see Linked Data as an approach to data integration that employs ontologies, terminologies, Uniform Resource Identifiers (URIs), and the Resource Description Framework (RDF) to connect pieces of data, information and knowledge on the Semantic Web (Belleau et al., 2008). In particular, RDF describes semantic rich information on the web through a composition of simple triples (predicates), such as (‘Subject’, ‘Property’, ‘Object’), that link entities through relations which are expressed by using ontologies, and are defined by using URIs. See the RDF reference site: http://www.w3.org/RDF/, last accessed on October 3, 2012). A relevant contribution to this vision comes from the conversion of data stored in relational databases (RDB) into RDF. There is a vast amount of information on human variation in the literature and several mutation and variation databases, but, to our knowledge, this kind of information is still scarce in the Web of Data. Various motivations can be depicted for using Semantic Web technologies and publishing Linked Data life sciences datasets; this allows to improve data and information integration, share ability of openly accessible data through standard and programmatic interfaces, semantic normalization, data discoverability and query federation from distributed sources. A first work carried out by our group led to the implementation of an RDF version (Zappa et al., 2012) of the IARC TP53 Somatic Mutation database (IARCDB) (Petitjean et al., 2007). Here, we present the initial development of an RDF version of the COSMIC (Catalogue of Somatic Mutations in Cancer) database by means of Semantic Web technologies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Investigation of Mutations and Comparison of Leptin Gene Pro-Motor in Najdi Cattle with the Database NCBI Sequences

Objective: Identity the genetic aspects and major gene influence on energy balance, milk production, fertility, food safety and consumer are the recent interests of genetic and breeding researchers. Methods: Najdi Cattle is the most prominent breeds in Khuzestan province. To do this plan in Shoushtar Najdi Cattle Station, blood samples were taken from 15 Najdi Cattles. DNA was extracted from wh...

متن کامل

The Investigation of Mutations and Comparison of Leptin Gene Pro-Motor in Najdi Cattle with the Database NCBI Sequences

Objective: Identity the genetic aspects and major gene influence on energy balance, milk production, fertility, food safety and consumer are the recent interests of genetic and breeding researchers. Methods: Najdi Cattle is the most prominent breeds in Khuzestan province. To do this plan in Shoushtar Najdi Cattle Station, blood samples were taken from 15 Najdi Cattles. DNA was extracted from wh...

متن کامل

Neuron Mathematical Model Representation of Neural Tensor Network for RDF Knowledge Base Completion

In this paper, a state-of-the-art neuron mathematical model of neural tensor network (NTN) is proposed to RDF knowledge base completion problem. One of the difficulties with the parameter of the network is that representation of its neuron mathematical model is not possible. For this reason, a new representation of this network is suggested that solves this difficulty. In the representation, th...

متن کامل

Linked Functional Annotation For Differentially Expressed Gene (DEG) Demonstrated using Illumina Body Map 2.0

Semantic Web technologies are core for the integration of disparate data resources. It can be used to exploit data from next generation sequencing (NGS) for therapeutic decisions regarding cancer. In this manuscript, we describe how different data resources, which inform on the expression of specific genes in a tissue and its variants, can be brought together to indicate a risk for tissue-speci...

متن کامل

Data mining using the Catalogue of Somatic Mutations in Cancer BioMart

Catalogue of Somatic Mutations in Cancer (COSMIC) (http://www.sanger.ac.uk/cosmic) is a publicly available resource providing information on somatic mutations implicated in human cancer. Release v51 (January 2011) includes data from just over 19,000 genes, 161,787 coding mutations and 5573 gene fusions, described in more than 577,000 tumour samples. COSMICMart (COSMIC BioMart) provides a flexib...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012